
Take Home Example
Data Mining @ Yi-Shin Chen 53
©Tan, Steinbach, Kumar Introduction to Data Mining
P(Refund=Yes|No) = 3/7
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0
P(Refund=No|Yes) = 1
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
P(Marital Status=Single|Yes) = 2/7
P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
For taxable income:
If class=No: sample mean=110
sample variance=2975
If class=Yes: sample mean=90
sample variance=25
naive Bayes Classifier:
120K)IncomeMarried,
o,Refun
(X
P(X|Class=No) = P(Refund=No|Class=No)
P(Married| Class=No)
P(Income=120K| Class=No)
= 4/7 4/7 0.0072 = 0.0024
P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Married| Class=Yes)
P(Income=120K| Class=Yes)
= 1 0 1.2 10-9 = 0
Since P(X|No)P(No) > P(X|Yes)P(Yes)
Therefore P(No|X) > P(Yes|X)
=> Class = No
Given a Test Record:
Naïve Bayesian Classifier: Comments
Advantages :
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history etc
E.g., Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc
Dependencies among these cannot be modeled by Naïve
Bayesian Classifier
How to deal with these dependencies?
Bayesian Belief Networks
Data Mining @ Yi-Shin Chen 54